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Due to divergence, genetic variation is 
generally believed to be high among 
distantly related strains, low among 
closely related ones and little or none 
within the same classified clonal groups. 
Several recent genome-wide studies, 
however, revealed that significant genetic 
variation resides in a considerable num- 
ber of genes among strains with identi- 
cal MLST (Multilocus sequence typing) 
types and much of the variation was 
introduced by homologous recombina- 
tion. Recognizing and understanding 
genomic variation within clonal bacterial 
groups could shed new light on the evo- 
lutionary path of infectious agents and 
the emergence of particularly pathogenic 
or virulent variants. This commentary 
presents our recent contributions to this 
line of work. 

Introduction 

Nucleotide sequences diverge over time 
due to the combined effects of point 
mutation and homologous recombina- 
tion. Recombination events cause changes 
to regions of contiguous bases in single 
events and were generally assumed to be 
rare in bacteria. However, there is grow- 
ing evidence that homologous recombina- 
tion has a significant impact on sequence 
diversification during bacterial genome 
evolution. A recent analysis on the MLST 
(Multilocus sequence typing) data of 46 
bacterial and two archaeal species revealed 
27 (56%) species in which homolo- 
gous recombination contributed to more 
nucleotide changes than point mutation.' 
The rapid genetic change introduced 
by homologous recombination could 



facilitate ecological adaption and drive 
pathogenesis in bacterial pathogens.^ ' 

Currently, the MLST scheme, using 
DNA fragments from seven housekeeping 
genes,'' has been routinely used to char- 
acterize bacterial isolates.^ The standard 
MLST scheme has also been extended 
to construct fine-scale relationships and 
further subdivide identical multilocus 
sequence types (STs) using more loci 
or a large amount of shared genomic 
sequences.**''^ Given the common occur- 
rence of homologous recombination, it 
becomes crucial to investigate the genome- 
wide extent of homologous recombination, 
which could also benefit the construction 
of the strain history and tracking the 
spread of emerging pathogens. 

Identification and Quantification 
of Nonvertically Acquired 
Genes via Recombination 
within Identical STs 

Identifying recombinational exchanges 
in closely related strains is challenging 
as recombinational exchanges involved 
in a small number of nucleotides may be 
mistaken as point mutations. Guttman 
and Dykhuizen (1994) have successfully 
examined the clonal divergence of E. coli 
strains in the ECOR group A by consid- 
ering the divergence time and mutation 
rate and showed that recombination has 
occurred at a rate 50-fold higher than 
the mutation rate in four loci.'' Fell et 
al. (2000) estimated the ancestral allele 
for the isolates that differ only one locus 
out of the seven MLST loci and assigned 
recombination based on the number of 
derived nucleotides from the ancestral 
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Figure 1. Inference of homologous recombination in strains with identical STs. Under a binomial distribution of nucleotide substitution, there is a 
probability for no nucleotide change in the seven IVILST loci. That is fl - p,)" = 0.001, here n is the number of nucleotides in the seven MLST loci and is 
the upper bound of genome-wide nucleotide divergence ((jl) at 0.001 significance level given no change in the seven MLST loci. At genome-wide 
divergence (jl, genes that have more than the expected number of nucleotide changes at 0.001 significance level were deemed as nonvertically 
acquired genes. 



allele and on whether the nucleotides are 
novel in the population.''' 

We adopted a new approach (illustrated 
in Fig. 1) to identify recombinant genes 
in Neisseria meningitidis strains with 
identical STs," which does not require 
the estimation of divergence time and 
ancestral alleles and can be applied 
on any two strains with identical STs. 
In brief, nucleotide substitution was 
assumed to follow a binomial distribution 
and an upper bound of genome-wide 
divergence (|x) by point mutation was 
calculated for no observed substitution 
in all nucleotide sites of the seven MLST 
loci. The estimated maximum genome- 
wide divergence was then used as a 
benchmark to compute a P-value for the 
observed nucleotide changes of each gene 
in the genome to be explained by point 
mutation. Genes that have more than the 
expected number of nucleotide changes at 
a significance level of 0.001 were deemed 
as recombinant genes. Our results revealed 
that up to 19% of commonly present genes 
in N. meningitidis strains with identical 
STs have been affected by homologous 
recombination." 

In another study on E. coli O104 
(ST678) genomes, we visualized recombi- 
nant genes by plotting the pairwise DNA 
distance of orthologous genes along the 
genome and identified 167 genes in three 
gene clusters that have likely undergone 
homologous recombination."" A reanalysis 



on the orthologs between E. coli ON2010 
and 55989 (labeled as Ec55989 thereafter 
to avoid unnecessary confusion) genomes 
using both pairwise DNA distance and 
the P-values as described in ref 15 yielded 
remarkably similar results (Fig. 2). In fact, 
the use of nucleotide divergence between 
two genomes for homologous recombi- 
nation detection has been successful in 
other studies,' '^ one of which was on two 
E. coli ST131 strains. It has been observed 
that a higher portion (at least 9%) of core 
genes in the E. coli ST131 genomes than 
in the E. coli ST678 genomes (Fig. 2) are 
affected by homologous recombination.' 
The findings in both A'^ meningitidis and 
E. coli showed extensive genomic variation 
within identical STs. Since many bacterial 
species have a comparable or higher level 
of recombinogenicity than N. meningitidis 
or E. coli^ extensive genomic variation 
within identical STs should be expected in 
many bacterial species. 

It is important to note that the high 
genomic variation discovered within iden- 
tical STs''"''' should not be interpreted as 
artifacts of these studies. The high level 
of genomic variation within identical 
STs could, instead, be explained by that 
many non-vertical genes within identical 
STs are deleterious or transiently adaptive 
and undergo fast rates of evolution.'* In 
fact, the ratio of recombination to muta- 
tion rates was higher in the compari- 
son of clonally related strains'^''* than of 



relatively broadly sampled strains from 
the corresponding species.' Such a dis- 
crepancy between the estimated recombi- 
nation-mutation ratios highlights the need 
for a population genetics framework for 
the study of recombination and bacterial 
genome evolution." 

Genomic Regions Involved 
in Recombination 

Among the three gene clusters of recombi- 
nant genes we identified in E. coli 0104,'^ 
one gene cluster contained 125 genes and 
was likely involved in direct chromosomal 
homologous recombination specific to the 
ON2010 strain. These 125 genes were 
found in 20 different functional catego- 
ries and 70 of them were found in all the 
studied 57 E. coli and Shigella genomes. 
This is consistent with the conclusion that 
genes from all functional categories are 
subject to DNA exchange.^" Furthermore, 
the nearest phylogenetic neighbors of 
these genes were not clustered in a single 
phylogenetic group. We hypothesized 
that extensive recombination with a broad 
spectrum of strains has taken place in one 
genome, and this highly mosaic genome 
then recombined with the precursor to the 
ON2010 genome. 

The other two gene clusters of recom- 
binant genes in E. coli O104 were located 
in the prophage regions, but the genes 
in these two gene clusters were identical 
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Figure 2. Inferring genes involved in homologous recombination by comparing orthologs 
between two£. co// strains ON2010and Ec55989. (A) DNA distance was measured using DNADIST 
of the PHYLIP package.^' (B) P-values were calculated based on the maximum genome-wide 
divergence given the seven identical IVILST loci as illustrated in Figure 1. For simplicity, P -values 
smaller than 0.0001 were shown as 0.0001. Genes located in the prophage regions were colored in 
blue. Please note that more genes (4207 genes in total) were examined here than in our previous 
study"" (3794 genes), since our previous study focused on the genes present in both the 01 04 
strains and the IAI1 strain. 



between ON2010 and Ec55989 genomes."^ 
It is noteworthy that the reanalysis with 
more single-copy genes (with details in 
Fig. 2) revealed 5 prophage genes involved 
in recombination. These prophage genes 
are not present in all O104 strains and 
the outgroup lAIl strain. This could be 
explained by frequent recombination of 
the prophage genes with infecting phages 
or different prophages from other bacte- 
rial chromosomes. Since all examined 
O104 genomes are of conserved genome 
synteny, our observations support the 
argument that homologous (legitimate) 
recombination drives module exchange 
between phages.^' Together, these findings 
suggest that homologous recombination 
takes place frequently in both core genes 
and dispensable genes. 

Phylogenomic Consequence 

As the cost of sequencing drops, the char- 
acterization of bacterial isolates has uti- 
lized more shared genes or loci and shifted 
toward phylogenomic analysis. ^ '^'^^ Quite 
often, multiple gene alignments were con- 
catenated into a single super-alignment, 
from which phylogenies were recon- 
structed using a variety of methodologies. 
Such a data set, also known as a super- 
matrix, has been demonstrated to solve 
previously ambiguous or unresolved phy- 
logenies,^^ even in the presence of a low 
amount of horizontal gene transfer in the 
data set."'' Unfortunately, the supermatrix 
approach becomes very sensitive to recom- 
bination when applied to strains with iden- 
tical STs due to limited genuine sequence 
diversity. The concatenated sequences of 
3794 genes in the E. coli O104 strains"" 
were overwhelmed by the phylogenetic 
signal of the 125 recombinant genes, as 
many other genes are identical among the 
E. coli O104 strains (Fig. 2). 

The accuracy and robustness of the 
constructed evolutionary relationships can 
be improved by the exclusion of recom- 
binogenic and incongruent sequences. ''•^^ 
In fact, the removal of the 125 recombi- 
nant genes from the E. coli O104 data set'^ 
has resulted in consistent phylogenetic 
relationships of O104 strains by different 
phylogenetic approaches. One interesting 
finding of our E. coli O104 study is that 
the number of identical loci implemented 



in BIGSdb^'' was less sensitive to homolo- 
gous recombination than the concate- 
nated sequences of all loci."" This could be 
explained by the fact that recombination 
has affected a relatively small number of 
genes but introduced a substantial amount 
of diversity in the ON2010 genome. It 
is further noteworthy that supertrees, 
another widely used approach for phyloge- 
nomic analysis^^ are not suitable for char- 
acterizing strains with identical MLST 
types, as many individual genes are identi- 
cal or nearly identical and contain no or 
very limited phylogenetic information for 
each individual gene tree. 

Homologous Recombination 
and Pathogenic Adaptation 

Homologous recombination can bring 
the beneficial mutations arising in differ- 
ent genomes together and have a strong 
impact on ecological adaptation.*'^^ One 



well-known example was the recom- 
bination in the penA genes during the 
emergence of penicillin resistance in 
N. meningitidis?^ Variation of the penA 
gene corresponding to different lev- 
els of penicillin susceptibility has also 
been observed between A'^ meningitidis 
strains with the same MLST types. 
Furthermore, genetic variation within the 
same MLST types has been evident in the 
capsule gene cluster and genes used for 
vaccine target in N. meningitidis P These 
observations suggest a strong relationship 
between homologous recombination and 
pathogenic adaptation involved in anti- 
biotic resistance, capsule biosynthesis and 
vaccine efficacy. 

Recombination-mediated pathogenic 
adaptation was also evident in E. coli. 
Recombination has affected fimH which 
encodes mannose-specific type 1 fim- 
brial adhesin, resulting in distinct fluo- 
roquinolone-resistance profiles in ST131 
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Figure 3. Sequence alignment of fimH. Only informative sites are shown with coordinates at the 
top. The ON2010 sequence and its most similar sequences (differing by one nucleotide) are shown 
in light green. 



strains.' A survey of the fimH gene on the 
57 E. coll and Shigella genomes"' revealed 
that ON2010 was the only E. coli O104 
genome containing a^w//blast hit > 10% 
of length (Fig. 3). Except one nucleo- 
tide, the ^w// sequence in ON2010 was 
identical with E24377A and S88. On 
the ON2010 genome scaffold, fimH is 
upstream adjacent to a fructuronic acid 
transporter gene gntP, which is universally 
present in all E. coli and Shigella genomes. 
The gntP gene in ON2010 was also found 
to be involved in homologous recombina- 
tion (Fig. 2), and most importantly, the 
most similar sequences to the ON2010 



gntPweie also in E24377A and S88 (data 
not shown). The shared origin between 
the adjacent fimH and gntP genes in 
ON2010 suggested that patchily distrib- 
uted genes involved in pathogenesis could 
be introduced by homologous recombina- 
tion of the conserved flanking genes. 
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